environmental data
SatBird: Bird Species Distribution Modeling with Remote Sensing and Citizen Science Data
Biodiversity is declining at an unprecedented rate, impacting ecosystem services necessary to ensure food, water, and human health and well-being. Understanding the distribution of species and their habitats is crucial for conservation policy planning. However, traditional methods in ecology for species distribution models (SDMs) generally focus either on narrow sets of species or narrow geographical areas and there remain significant knowledge gaps about the distribution of species. A major reason for this is the limited availability of data traditionally used, due to the prohibitive amount of effort and expertise required for traditional field monitoring. The wide availability of remote sensing data and the growing adoption of citizen science tools to collect species observations data at low cost offer an opportunity for improving biodiversity monitoring and enabling the modelling of complex ecosystems. We introduce a novel task for mapping bird species to their habitats by predicting species encounter rates from satellite images, and present SatBird1, a satellite dataset of locations in the USA with labels derived from presence-absence observation data from the citizen science database eBird, considering summer (breeding) and winter seasons. We also provide a dataset in Kenya representing low-data regimes. We additionally provide environmental data and species range maps for each location.
California Wildfire Inventory (CAWFI): An Extensive Dataset for Predictive Techniques based on Artificial Intelligence
Bhowmik, Rohan Tan, Jung, Youn Soo, Aguilera, Juan, Prunicki, Mary, Nadeau, Kari
Due to climate change and the disruption of ecosystems worldwide, wildfires are increasingly impacting environment, infrastructure, and human lives globally. Additionally, an exacerbating climate crisis means that these losses would continue to grow if preventative measures are not implemented. Though recent advancements in artificial intelligence enable wildfire management techniques, most deployed solutions focus on detecting wildfires after ignition. The development of predictive techniques with high accuracy requires extensive datasets to train machine learning models. This paper presents the California Wildfire Inventory (CAWFI), a wildfire database of over 37 million data points for building and training wildfire prediction solutions, thereby potentially preventing megafires and flash fires by addressing them before they spark. The dataset compiles daily historical California wildfire data from 2012 to 2018 and indicator data from 2012 to 2022. The indicator data consists of leading indicators (meteorological data correlating to wildfire-prone conditions), trailing indicators (environmental data correlating to prior and early wildfire activity), and geological indicators (vegetation and elevation data dictating wildfire risk and spread patterns). CAWFI has already demonstrated success when used to train a spatio-temporal artificial intelligence model, predicting 85.7% of future wildfires larger than 300,000 acres when trained on 2012-2017 indicator data. This dataset is intended to enable wildfire prediction research and solutions as well as set a precedent for future wildfire databases in other regions.
Generating Actionable Robot Knowledge Bases by Combining 3D Scene Graphs with Robot Ontologies
Nguyen, Giang, Pomarlan, Mihai, Jongebloed, Sascha, Leusmann, Nils, Vu, Minh Nhat, Beetz, Michael
-- In robotics, the effective integration of environmental data into actionable knowledge remains a significant challenge due to the variety and incompatibility of data formats commonly used in scene descriptions, such as MJCF, URDF, and SDF . This paper presents a novel approach that addresses these challenges by developing a unified scene graph model that standardizes these varied formats into the Universal Scene Description (USD) format. We evaluated our approach by converting procedural 3D environments into USD format, which is then annotated semantically and translated into a knowledge graph to effectively answer competency questions, demonstrating its utility for real-time robotic decision-making. Additionally, we developed a web-based visualization tool to support the semantic mapping process, providing users with an intuitive interface to manage the 3D environment. In AI-powered and cognition-enabled robotics, robot agents face the challenge of fulfilling underdetermined task requests such as "prepare a breakfast" or "bring me something to drink." To accomplish these tasks, robots must infer the specific body movements required, which heavily depend on the given environment and the robot's knowledge and reasoning capabilities. This knowledge includes the physics, geometry, and visual characteristics of the environment and its objects. Although the necessary details for computing these movements are contained within virtual reality environments' scene graph data structures, these structures are not standardised, inherently machine-understandable, or interpretable. This limitation restricts a robot's ability to answer task-critical queries in changing environments, such as whether milk is stored within a container, how to operate a refrigerator or the outcomes of handling a milk carton by the lid.
SatHealth: A Multimodal Public Health Dataset with Satellite-based Environmental Factors
Wang, Yuanlong, Wang, Pengqi, Yin, Changchang, Zhang, Ping
Living environments play a vital role in the prevalence and progression of diseases, and understanding their impact on patient's health status becomes increasingly crucial for developing AI models. However, due to the lack of long-term and fine-grained spatial and temporal data in public and population health studies, most existing studies fail to incorporate environmental data, limiting the models' performance and real-world application. To address this shortage, we developed SatHealth, a novel dataset combining multimodal spatiotemporal data, including environmental data, satellite images, all-disease prevalences estimated from medical claims, and social determinants of health (SDoH) indicators. We conducted experiments under two use cases with SatHealth: regional public health modeling and personal disease risk prediction. Experimental results show that living environmental information can significantly improve AI models' performance and temporal-spatial generalizability on various tasks. Finally, we deploy a web-based application to provide an exploration tool for SatHealth and one-click access to both our data and regional environmental embedding to facilitate plug-and-play utilization. SatHealth is now published with data in Ohio, and we will keep updating SatHealth to cover the other parts of the US. With the web application and published code pipeline, our work provides valuable angles and resources to include environmental data in healthcare research and establishes a foundational framework for future research in environmental health informatics.
ExARNN: An Environment-Driven Adaptive RNN for Learning Non-Stationary Power Dynamics
Li, Haoran, Guo, Muhao, Weng, Yang, Ilic, Marija, Ruan, Guangchun
--Non-stationary power system dynamics, influenced by renewable energy variability, evolving demand patterns, and climate change, are becoming increasingly complex. Accurately capturing these dynamics requires a model capable of adapting to environmental factors. T o address this, we propose the External Adaptive RNN (ExARNN), a novel framework that integrates external data (e.g., weather, time) to continuously adjust the parameters of a base RNN. ExARNN achieves this through a hierarchical hypernetwork design, using Neural Controlled Differential Equations (NCDE) to process external data and generate RNN parameters adaptively. This approach enables ExARNN to handle inconsistent timestamps between power and external measurements, ensuring continuous adaptation. Extensive forecasting tests demonstrate ExARNN's superiority over established baseline models.
Temporal Interception and Present Reconstruction: A Cognitive-Signal Model for Human and AI Decision Making
This paper proposes a novel theoretical model to explain how the human mind and artificial intelligence can approach real-time awareness by reducing perceptual delays. By investigating cosmic signal delay, neurological reaction times, and the ancient cognitive state of stillness, we explore how one may shift from reactive perception to a conscious interface with the near future. This paper introduces both a physical and cognitive model for perceiving the present not as a linear timestamp, but as an interference zone where early-arriving cosmic signals and reactive human delays intersect. We propose experimental approaches to test these ideas using human neural observation and neuro-receptive extensions. Finally, we propose a mathematical framework to guide the evolution of AI systems toward temporally efficient, ethically sound, and internally conscious decision-making processes
Multimodal Data Integration for Sustainable Indoor Gardening: Tracking Anyplant with Time Series Foundation Model
Nabaei, Seyed Hamidreza, Zheng, Zeyang, Chen, Dong, Heydarian, Arsalan
Indoor gardening within sustainable buildings offers a transformative solution to urban food security and environmental sustainability. By 2030, urban farming, including Controlled Environment Agriculture (CEA) and vertical farming, is expected to grow at a compound annual growth rate (CAGR) of 13.2% from 2024 to 2030, according to market reports. This growth is fueled by advancements in Internet of Things (IoT) technologies, sustainable innovations such as smart growing systems, and the rising interest in green interior design. This paper presents a novel framework that integrates computer vision, machine learning (ML), and environmental sensing for the automated monitoring of plant health and growth. Unlike previous approaches, this framework combines RGB imagery, plant phenotyping data, and environmental factors such as temperature and humidity, to predict plant water stress in a controlled growth environment. The system utilizes high-resolution cameras to extract phenotypic features, such as RGB, plant area, height, and width while employing the Lag-Llama time series model to analyze and predict water stress. Experimental results demonstrate that integrating RGB, size ratios, and environmental data significantly enhances predictive accuracy, with the Fine-tuned model achieving the lowest errors (MSE = 0.420777, MAE = 0.595428) and reduced uncertainty. These findings highlight the potential of multimodal data and intelligent systems to automate plant care, optimize resource consumption, and align indoor gardening with sustainable building management practices, paving the way for resilient, green urban spaces.